Exploration of Projection Spaces

Data

To be able to explore paths in a projected space, you need to pick a problem/algorithm/model that consists of multiple states that change iteratively.

Click to see an Example An example would is the solving of a Rubik's Cube. After each rotation the state of the cube changes. This results in a path from the initial state, through the individual rotations, to the solved cube. By using projection we can examine the individual states and paths in the two-dimensional space. Depending on the initial state and the solution strategy the paths will differ or resemble each other. This is an example of solving 10 randomly scrambled Rubik's Cubes with two different strategies, the Beginner (in green) and the Fridrich Method (in orange):
Rubiks's Cube Sovling Strategies
You can see that although each cube is scrambled differently in the beginning, both strategies converge to the same paths after a few steps. You can also notice that the Beginner's method takes some additional paths that are not necessary with the Fridrich method.

Read and Prepare Data

Read in your data from a file or create your own data.

Document any data processing steps.

Comments

In addition to just the raw field-counts (a0-a15), i extended the data by:

Projection

Project your data into a 2D space. Try multiple (3+) projection methods (e.g., t-SNE, UMAP, MDS, PCA, ICA, other methods) with different settings and compare them.

Make sure that all additional dependencies are included when submitting.

Failed visualization: TSNE + PCA + ICA with different data

Better Visualization: 2D-SGD regression - predict step/zeros from state

Best visualization: 5-layer-autoencoder with *-10-2-10-* features

Note: we did not include pytorch by default. If you want to run this, please install it first.

Comments

I tried different features and combinations in different points, but using any continuous helper (like steps, score, etc.) really helpe, and in general counts seem to be more stable, than fields, because for fields, all values can change in one move.

Most projections like UMAP and TSNE don't work too well for this data, as followup-states are often not too adjacent, especially in the field-space (a-variables)

Manually experiments with perplexity and n_neighbors but it didn't lead to good results.

For the autoencoder, i experimented with multiple learning rates until the results were good and fast.

Yes the global structure shows that the state-space is wider (the more advanced the game is, the more different the field can be) as well as "segments" of the game that coincide with the merging-together of the next-bigger tile.

THe local structure shows that the merging that leads to the next level of field-complexity due to a new highest tile can happen at different times.

Connect the states that belong together.

The states of a single solution should be connected to see the path from the start to the end state. How the points are connected is up to you, for example, with straight lines or splines.

See 2) Projection

Meta Data Encoding

Encode addtional features in the visualization.

Use features of the source data and include them in the projection, e.g., by using color, opacity, different shapes or line styles, etc.

See 2) Projection

Added start + end-markers + colors for line-types. Also due to transparency and thin lines and drawing the shorter paths last, you can see more structure.

Comments

In the end the autoencoder provided a result very similar to the manually selected features, especially when combined with PCA. The PCA of the raw data is kind of similar, so the sturcture of the data seems mostly conserved, and the result even without autoencoding looks good after rescaling.

The features are encoded to minimize the reconstruction-loss. Setting a quadratic loss means the algorithm tries to avoid high losses more, so features with a bigger range (score/step) seem to have a bigger impact than reconstructing the counters or a-states.

Rescaling (to log-space or uniform range) didn't make too much of a difference.

Optional

Projection Space Explorer (click to reveal)

Projection Space Explorer

The Projection Space Explorer is a web application to plot and connect two dimensional points. Metadata of the data points can be used to encode additonal information into the projection, e.g., by using different shapes or colors. Further Information:
  • Paper: https://jku-vds-lab.at/publications/2020_tiis_pathexplorer/
  • Repo: https://github.com/jku-vds-lab/projection-space-explorer
  • Application: https://jku-vds-lab.at/projection-space-explorer/

Data Format

How to format the data can be found in the Projection Space Explorer's README. Example data with three lines, with two colors (algo) and additional mark encoding (cp):
x y line cp algo
0.0 0 0 start 1
2.0 1 0 state 1
4.0 4 0 state 1
6.0 1 0 state 1
8.0 0 0 state 1
12.0 0 0 end 1
-1.0 10 1 start 2
0.5 5 1 state 2
2.0 3 1 state 2
3.5 0 1 state 2
5.0 3 1 state 2
6.5 5 1 state 2
8.0 10 1 end 2
3.0 6 2 start 2
2.0 7 2 end 2
Save the dataset to CSV, e.g. using pandas: df.to_csv('data_path_explorer.csv', encoding='utf-8', index=False) and upload it in the Projection Space Explorer by clicking on `LOAD DATASET` in the top left corner. ℹ You can also include your high dimensionmal data and use it to adapt the visualization.

Results

You may add additional screenshots of the Projection Space Explorer.

All graphs + also highlighting the end + coloring according to the biggest tile and we can clearly see our game-segment-theory confirmed.

The data used was the trace10_4.csv.autoencoded.csv we just saved in the last step.

All

Only the first 3 algorithms so we can zoom in a bit furhter.

Random vs greedy vs lookahead 1

Interpretion

There seem to be events in the graph that open up a completely new state space. The distances between them become longer, meaning that it is probably the appearance of a new biggest tile. This leads to a wider state space, which is what makes the game harder. There are more differing tiles lying around and they can form more tile-placements.

Most of the initial visualizations didn't show a lot, as the state-space is in some ways more complex than something like a chess field or a rubics cube and the state changes more in one move than in chess or rubics cube.

Using more complex features like counts and steps/scores really helped it to get to results i expected. A global visualization where you can see the difference in performance of the differing algorithms, and the "global structure" of the game, that unlocking a bigger tile, kind of behaves like a "new level".

I didn't have many prior hypothesis, the data analysis was more exploratory, as the state-space of 2048 isn't as well researched as chess or rubics cube. The major difference is that there is no real "end state" like winning that is clearly indicated rounds before, and there is no typical "end game area" in the state-space of "nearly winning states".

The global structure of the game with different state-space-segments was kind of interesting.

Submission

When you’ve finished working on this assignment please download this notebook as HTML and add it to your repository in addition to the notebook file.